A Study of Virtual Memory MTU Reassembly within the PowerPC Architectur
نویسنده
چکیده
Emerging network technologies, such as Asynchronous Transr Mode (ATM), have been the focus of manufacturing efforts imed at producing fully integrated, high-performance solutions [ 1. Chipsets like the IDT77201 NICStAKTM 161, for example, s pport segmentation and reassembly of ATM cells, hardware hecksumming, and DMA protocols that specify sequential cell lacement in predefined, possibly discontiguous, hardware pages. he latter feature eliminates the need for an operating system (OS) t copy data between host memory and a network card during data ansmissions, yielding what is known as a zero-copy interface. 1 For a small message transfer unit (MTU also known as a maxium transfer unit or a hardware layer protocol data unit PDU), is DMA approach offers substantial performance gains. Many odem networks, however, have been defined to support large MTU izes. The ATM specification, for example, allows MTUs up to 64 in length 11). FDDI calls for MTUs of 4,352 bytes, and IP over TM requires 9,180 bytes 1131. When large MTUs are streamed Into a receiver’s memory on an architecture with small page sizes, vera11 YO performance can actually drop. This is because the OS eeds to concatenate and reassemble pages belonging to an MTU iefore it can send it up to the protocol stack for additional processi ing. Systems that implement reassembly using copy semantics suffer the most since they require an additional slow memory copy (memcopy) at the device driver level. General issues involving memory-to-memory copies [SI and network-to-memory copies [3] have been known to represent a significant portion of the overhead associated with network data transport services. Much of the literature, however, has tended to concentrate on maximizing buffer reassembly performance at the protocol layers, where messages (MTUs) are combined into an application data unit, orADU. MTU reassembly has, on the other hand, received far less attention. A widely adopted ad-hoc solution is to copy each fragmented MTU into a separate, pre-allocated, contiguous, MTU-sized buffer. This method is not only wasteful in terms of memory usage and CPU time, but also slow. Using a similar strategy, Traw and Smith studied data movement from a fast ATM card they designed to system and user spaces in the AURORA Testbed environment 121. Their resource inefficient technique required a kernel pre-allocation of two 64 KB contiguous pinned buffers. One of the inevitable findings of that research was that overall bandwidth was highly dependent on data copy performance. Virtual memory remapping has become a logical solution to the copy problem. It is an attractive scheme because the system “moves” data by altering page table entries rather than performing physical copies (41. Tzou andAnderson measured the impact of remapping virtual addresses in message passing environments [U]. Their study showed that the use of buffers and preallocated virtual and physical address regions limitedVM remapping performance gains over data copy. As a result, the authors suggested limiting generaked use of the virtual address space by mapping communications buffers to a fixed virtual address range shared by all processes. This solution is not general enough, however, for use in fast, next generation communications systems. Other papers, such as [12], provide similar performance observations but only offer additional nongeneralized buffer-based solutions. Druschel and Peterson recently proposed another such technique known asfbuji, or fast buffers, to mitigateVM remapping expenses [12]. Fbufs are allocated by the operating system from a shared memory pool and sent to device drivers or applications for incoming and outgoing data. The buffers are allocated and reused in such a way so as to minimize TLB and cache flushes, memory management unit (MMU) updates, and protection domain traversal overheads. An implementation of fbufs in Solaris by Thadani and Khalidi showed that network throughput improved by more than 40% and CPU utilization dropped by more than 20% [lo].
منابع مشابه
Virtual memory in contemporary microprocessors
Virtual memory is a technique for managing the resource of physical memory. It gives an application the illusion of a very large amount of memory, typically much larger than what is actually available. It protects the code and data of user-level applications from the actions of other programs but also allows programs to share portions of their address spaces if desired. It supports the executio...
متن کاملSemiotics of Collective Memory of the Iran-Iraq War (Holy Defence): A Case Study of the Shared Images in Virtual Social Networks
This study aims to achieve a semiotic understanding of collective memory of the Iran-Iraq war. For this purpose, samples of images in virtual social networks shared in response to the news of discovery and return of the bodies of more than 175 divers have been analyzed. Visual signs in photographs, cartoons, graphic designs, prints, paintings and posters, in methods of historical pictures and f...
متن کاملSegmented Addressing Solves the Virtual Cache Synonym Problem
If one is interested solely in processor speed, one must use virtually-indexed caches. The traditional purported weakness of virtual caches is their inability to support shared memory. Many implementations of shared memory are at odds with virtual caches—ASID aliasing and virtual-address aliasing (techniques used to provide shared memory) can cause false cache misses and/or give rise to data in...
متن کاملUntyped Memory in the Java Virtual Machine
We have implemented a virtual execution environment that executes legacy binary code on top of the type-safe Java Virtual Machine by recompiling native code instructions to type-safe bytecode. As it is essentially impossible to infer static typing into untyped machine code, our system emulates untyped memory on top of Java’s type system. While this approach allows to execute native code on any ...
متن کاملSRC: a multicore NPU-based TCP stream reassembly card for deep packet inspection
Stream reassembly is the premise of deep packet inspection, regarded as the core function of network intrusion detection system and network forensic system. As moving packet payload from one block of memory to another is essential for the reason of packet disorder, throughput performance is very vital in stream reassembly design. In this paper, a stream reassembly card (SRC) is designed to impr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997